Logical Methods in Computer Science 
Vol. 8(3:29)2012, pp. 1-35 
www.lmcs-online.org 



Submitted Oct. 17, 2011 
Published Sep. 30, 2012 



Invariant Generation through Strategy Iteration in Succinctly Represented 

Control Flow Graphs 

THOMAS MARTIN GAWLITZA" AND DAVID MONNIAUX*" 

" School of Information Technologies, The University of Sydney, Australia 
e-mail address: gawlitza@it.usyd.edu.au 

CNRS / VERIMAG Laboratory, Centre Equation, 2 avenue de Vignate, 38610 Gieres, France 
e-mail address: David.Monniaux@imag.fr 



Abstract. We consider the problem of computing numerical invariants of programs, for 
instance bounds on the values of numerical program variables. More specifically, we study 
the problem of performing static analysis by abstract interpretation using template linear 
constraint domains. Such invariants can be obtained by Kleene iterations that are, in order 
to guarantee termination, accelerated by widening operators. In many cases, however, 
applying this form of extrapolation leads to invariants that are weaker than the strongest 
inductive invariant that can be expressed within the abstract domain in use. Another 
well-known source of imprecision of traditional abstract interpretation techniques stems 
from their use of join operators at merge nodes in the control flow graph. The mentioned 
weaknesses may prevent these methods from proving safety properties. 

The technique we develop in this article addresses both of these issues: contrary to 
Kleene iterations accelerated by widening operators, it is guaranteed to yield the strongest 
inductive invariant that can be expressed within the template linear constraint domain in 
use. It also eschews join operators by distinguishing all paths of loop-free code segments. 
Formally speaking, our technique computes the least fixpoint within a given template linear 
constraint domain of a transition relation that is succinctly expressed as an existentially 
quantified linear real arithmetic formula. 

In contrast to previously published techniques that rely on quantifier elimination, our 
algorithm is proved to have optimal complexity: we prove that the decision problem asso- 
ciated with our fixpoint problem is flj-complete. Our procedure mimics a flj search. 
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1. Introduction 

Static program analysis aims at deriving properties that are valid for all possible executions 
of a program, through an algorithmic processing of its source or object code. Examples 
of interesting properties include: "the program always terminates"; "the program never 
executes a division by zero" ; "the program never dereferences a null pointer" ; "the value of 
variable x always lies between 1 and 3" ; "the output of the program is well-formed XHTML" . 
There is considerable practical interest in being able to prove such properties automatically, 
in particular for software used in safety-critical applications, e.g., in fly-by- wire flight control 
systems in aircraft |60j. 

1.1. Abstract interpretation. It is well-known that fully automatic, sound and complete 
program analysis is impossible for any nontrivial property regarding the final output of a 
programj^ All analysis methods therefore suffer from at least one of the following limitations: 
they may be limited to programs with finite (and not too large) memory, or to bounded 
execution times; they may be unsound (they may infer untrue properties); or they may be 
incomplete (they fail to prove certain true properties). In this article, we use the abstract 
interpretation framework of Cousot and Cousot ^18j to construct a static analysis technique 
that is sound, but incomplete. 

Static analysis by abstract interpretation replaces the computation over concrete reach- 
able states by computations over symbolically represented sets of concrete states. The sets 
are taken from an abstract domain. For instance, one may aim at computing, for each 
program point p and each program variable x, an interval in which the value of x is guar- 
anteed to lie whenever the program reaches program point p. An analysis solely based 
on such intervals is known as interval analysis [17j . More refined numerical analyses in- 
clude, for instance, finding for each program point an enclosing polyhedron for the vector 
of program variables |19] . By restricting the analysis to handle only sets found within a 
particular abstract domain (e.g., Cartesian products of intervals or convex polyhedra), one 
can make the problem tractable, at the expense of over- approximation. For instance, if the 
domain in use consists of convex shapes, only, non-convex invariants will necessarily get 
over-approximated. 

In addition to the abstract domain not being able to represent the required properties, 
a major source of imprecision is the use of widening operators to enforce the convergence of 
Kleene iterations within finitely many iteration steps [18] . These operators extrapolate the 
first iterates of the Kleene sequence, say, of the intervals [0, 1], [0, 2], [0, 3], ... to a plausible 
limit, say [0,-|-oo), ensuring termination of the accelerated iteration. However, such an 
accelerated iteration may overshoot the target, leading to further over-approximations of the 
desired result. In order to regain precision lost by widening, one can then apply narrowing. 
In its simplest form, narrowing is a descending iteration towards a fixpoint that strengthens 
the invariant step by step. For more detailed information on Kleene iteration techniques 
in the context of abstract interpretation, we refer the reader to Cousot and Cousot [18]. 
Many variants of this basic iteration scheme have been proposed to alleviate the over- 
approximations introduced by widening [311 [351 [38] . However, all these techniques do not 
guarantee to find the strongest inductive invariant that can be expressed in the abstract 
domain in use. 



^This result, formally given within the framework of recursive function theory, is known as Rice's theo- 
rem [51 p. 34] [52 corollary B]. 
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Let us illustrate the above mentioned weaknesses on the following simple example: 
i = 0; 

while (true) { 

if (i < 10) i = 1+2; 

else goto _end ; } 
_end: prii^tf ( " i .= .%d\i^" ) ; 

The strongest invariant, that is, the set of reachable states, is given by the proposition 
i G {0,2,4,6,8,10}, which, together with the exit condition i > 10, yields i = 10 as the 
only possible final value of i at program point _end. Interval analysis by Kleene iterations 
with widenings computes the intervals [0, 0], [0, 2], [0, 4] and may then widen to [0,+oo). 
The narrowing phase yields the inductive invariant i G [0, 11]. From this we can conclude 
that the final value of i is in the interval [10, 11]. The obtained interval [0, 11] represents the 
strongest inductive invariant that can be expressed as an intervalj^ It is, however, not the 
strongest invariant expressible as an interval, which is i G [0, 10]. The invariant i G [0, 10] is 
not inductive, because a state with i = 9 is mapped to a state with i = 11 by one iteration 
of the loop. 

Unfortunately, small changes to the above program can make the widening/narrowing 
approach fail to produce a good invariant. Consider, for instance, the introduction of an 
additional non-deterministic choice, represented by the function choice (): 

i = 0; 

while (true) { 
if ( choice ( ) ) { 

if (i < 10) i = i+2; 

else goto _end ; } } 
_end: printf ( " i .= .%d\n" ) ; 

The program still outputs the value 10, whenever it terminates. The only difference from 
the first version of the program is that there is, in each iteration, a non-deterministic choice 
whether or not the original loop body is to be executed. If we perform the widening/nar- 
rowing technique on the modified version, the widening phase will produce the same result 
[0, +CX)). However, the narrowing phase is now not able to regain any precision lost due 
to widening. The loop body represents the relation r = {{i,i) | i G Z} U {{i,i + 2) \ i G 
Z and i < 10}. This relation is reflexive, that is, G r for all i G Z. The problem is 
of a general nature: Whenever the transition relation r of a loop is reflexive, descending 
iterations fail to improve the inductive invariant obtained by widening. 

Of course, on such a simple example, one could use simple tricks to get rid of the 
imprecision and recover the interval [0, 11]: remove the identity from the transition relation 
(this does not change the set of all (inductive) invariants), or try a form of widening with 

Some presentations of Hoare logic or static analysis call "invariant" what we refer to in this article as 
"inductive invariant" : a set (or a logical formula defining such a set) containing all initial states and stable 
by the transition relation. In our terminology, an invariant is merely a property true at all times. With these 
definitions, an inductive invariant is an invariant by induction on the length of the execution trace, thus the 
terminology; however an invariant is not necessarily inductive. Consider the initial state {x,y) = (1,0) and 
a transition consisting in a 45° clockwise rotation around (0, 0) : (x, y) £ [—1,1] x [—1, 1] is an invariant (it 
is always true), but it is not inductive because [—1,1] x [—1,1] is not stable by this rotation. 
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thresholds, also known as widening "up to" ^39]. However, such approaches are brittle and 
may fail for more complex programs. 

1.2. Alternatives to the widening/narrowing approach. Because of the known weak- 
nesses of the widening/narrowing approach, alternative methods have been proposed. Find- 
ing an inductive invariant in an abstract domain can be recast as solving a constraint system. 
Finding the strongest inductive invariant is then the problem of finding a minimal solution 
to the constraint system. The technique described in this article is related to two recently 
proposed approaches, which we shall now briefly describe. 

Quantifier elimination. Monniaux [485 considers abstract domains where elements are de- 
fined by a logical formula / (more specifically, a conjunction of linear inequalities) that links 
the program variables to some parameters. For instance, intervals on two variables x, y are 
defined by I := —lx<x<Uxf\—ly<y<Uy, where lx,Ux,ly,Uy are the parameters. An 
element from the abstract domain defined by the template / is specified by an assignment 
of values to the parameters. 

Consider a set of initial states given by a formula l (in the above example, with free 
variables a = {x,y)) and a transition relation given by r (in the above example, with free 
variables (c, o"') = {x,y,x' ,y')). I defines an inductive invariant for l and r if and only if 

Vcj . L{a) =^ I{a) A Va, a' . {l{a) A t{(t, a') I{a')) . (1.1) 

Here, I{cr) is the formula / as above and I{cr') is the formula / with a replaced by a'. 



The free variables of formula (1.1) are the parameters in /. In the above example, they 
are lx,Ux,ly,Uy. Any satisfying assignment to these variables defines an inductive invariant 
from the abstract domain. A least inductive invariant in the abstract domain is then defined 



by constructing, using formula (1.1) as a building block, a formula whose solution is the 
minimal solution of (1.1), using that, for any formula F, xq = minja; | F{x)} if and only if 

F(xo) AVx. (F(x) ^ xo < x) . (1.2) 

The static analyzer then proceeds as follows: transform the loop into a set of initial states 



L and a transition relation r. From these formulas, construct Formula 1.2 Then, call a 
solver capable of dealing with quantified formulas, e.g, a quantifier elimination procedure 
or a lazy version thereof such as the one developed by Monniaux . 

As an extension to this framework, l and r may have additional variables, e.g., pre- 
condition or system parameters. The formula defining the least inductive invariant will 
then take the invariant parameters as a partial function (in the mathematical sense, that 
is, as a binary related each input to at most one output) of these precondition or system 
parameters. By quantifier elimination and further processing of the formula, it is possible 
to turn this formula into a closed-form function, and even into executable code computing 
that function (a tree of if-then-else statements with assignments at the leaves). 

This approach allows to effectively synthesize best abstract transformers (a o r o ^ in 
the notation of Cousot and Cousot [18j). Unfortunately, quantifier elimination over linear 
real arithmetic is still very costly, despite the various recent works on this problem, and 
quantifier elimination over linear integer arithmetic and polynomial real arithmetic are even 
costlier. 
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The technique described in this article considers the same problem as the quantifier 
elimination approach, but without preconditions or system parameters. Our technique also 
uses a different algorithmic approach, called max-strategy iteration. 

Strategy Iteration. In this article, we introduce a refinement of the max-strategy iteration 
technique of Gawlitza and Seidl [26] for template linear constraint domains. The phrase 
"strategy iteration", also known as "policy iteration", comes from game theory. Let us 
consider two-players zero-sum games: the outcome of such a game is a real number, the two 
players (the maximizer and the minimizer) aim at maximizing (respectively, minimizing) the 
outcome. Strategy iteration is a method for computing the optimal strategy for one of the 
players. It successively improves a strategy through the following two steps until an optimal 
strategy is found: (Evaluation) Evaluate the currently selected strategy; and {Improvement) 
try to improve the currently selected strategy w.r.t. the result of the evaluation. 

The max-strategy iteration technique of Gawlitza and Seidl [21] for finding invariants 
is inspired by this game-theoretic approach. Instantiated on template linear constraint 
domains, it computes the strongest inductive invariant that can be represented by polyhedra 
of the form P{b) = {x € | Tx < b}, where T S W^^^ is a template constraint matrix, 
which is fixed before the analysis is run (heuristics for finding a suitable matrix are out- 
of-scope for this article). The variable x is the vector of program variables. The template 
constraint matrix T is the counterpart of the template I from the quantifier elimination 
technique of Monniaux [481 . Given T, every vector 6 S M uniquely determines a polyhedron 
P{b). The vector b contains the bounds on the linear functions that are represented by the 
rows of T. With the appropriate choice of T we can, among others, express the popular 
interval and octagon [HI 06] abstract domains. 

Similarly to Kleene iterations, the max-strategy improvement algorithm produces an 
ascending sequence of pre-fixpoints that are less than or equal to the least inductive invariant 
we are aiming for. The pre-fixpoints are obtained through convex optimization techniques, 
e.g., linear programming. In contrast to Kleene iterations, though, the algorithm converges 
to the least inductive invariant after at most exponentially many steps. Our conjecture is 
that it usually converges fast in practice, though one can concoct artificial examples that 
exhibit exponential behavior. 

Trace partitioning. Max-strategy iteration rids us of imprecisions introduced by widening, 
but, per se, does not remove imprecisions introduced by another operation: the merging 
of information from different program paths at join nodes in the control flow graph. In 
this article, we introduce a refinement of max-strategy iteration where we distinguish the 
various execution paths, in a manner similar to the work of Monniaux |48j . and Monniaux 
and Gonnord [49j. 

In most systems for static analysis by abstract interpretation, joins in the control-fiow 
graph result in computations of least upper bounds in the abstract domain. For instance, 
consider abstract interpretation over general convex polyhedra on the following program: 

if (x >= 0) y = x; 

else y = — x ; 

if (y >= 1) z = 3.5/x; 

The program divides 3.5 by the value of x provided that the absolute value of x is at least 1. 
A static analyzer that uses convex polyhedra as abstract domain may work as follows. 
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Figure 1: On the left: the graph of y = |2;| is the union of two half- hues, but computing their 
convex hull yields the grayed shape. By intersection with y > 1, we obtain the 
shape on the right, which contains points with a; = even though y = \x\ Ay > 1 
has no solution with x = 0. 
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Figure 2: Instead of considering two transitions (corresponding to a first if-then-else) fol- 
lowed by convex hull followed by two transitions (corresponding to a second if- 
then-else), as on the left, we get better precision by considering the four product 
transitions, as on the right. 

After the first if-then-else statement, a convex hull is computed between the x > A y = x 
and X < A y = —x half-lines, resulting in a much larger polyhedron (see Fig. [T]). The 
imprecision introduced by this operation prevents the analyzer from proving that a division 
by zero at line 3 is impossible. 

One solution is to get rid of all convex hulls corresponding to control flow joins by 
removing all control flow joins, except those corresponding to loop headers, by combining 
control flow edges. For instance, n successive if-then-else constructs can be turned into an 
expanded system of 2" transitions (Figure [2] shows this construction for n = 2). This is close 
to the trace partitioning approach of Rival and Mauborgne [53j One could therefore run 
this exponential transformation first, and then run max-strategy iteration or min-strategy 



iteration (Sec. 1.4). However, this transformation causes an exponential blowup and is 



therefore clearly not scalable. 



■^Trace partitioning analyses each program statement in different contexts according to an abstraction of 
the history of the control trace; thus, if a statement is preceded by n tests, it can potentially analyze this 
statement in 2" contexts. Because of this exponential blowup of maximal partitioning, trace partitioning 
techniques, including those implemented in Astree [HUH], use heuristics to "fold" abstract elements together 
using join operations. 
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In this article, we describe an algorithm that yields the same result as max-strategy 
iteration on this exponentially larger system. Our algorithm uses only polynomial space. It 
achieves this by keeping the exponentially large system implicit. 



Path focusing. Henry et al. [40], Monniaux and Gonnord |49j propose to run the classical 
Kleene iterations with widening and narrowing scheme not on the original control-flow 
graph, but on this exponentially larger system. In this approach, iterations are run on a 
distinguished subset of the original control nodes, such that all cycles in the original control 
flow graphs cross at least one of these distinguished nodes, using transitions corresponding 
to the simple paths between these distinguished nodes in the original control flow graph. 
The expanded control multigraph is kept implicit: the transitions, corresponding to simple 
paths in the original graph, are obtained on demand as solutions to SMT problems. This 
approach has the following advantages: 

(1) It fully does away with imprecisions introduced by "join" operations, except those 
corresponding to loops. 

(2) The transition relations on the simple paths may be accelerable. That is, they can be 



dealt with through acceleration techniques (cf. Sec. 1.4, [32 l [33 } H3]). 
(3) While it uses widening operators, it does away with some of the imprecisions they 
introduce by focusing on one path at a time, which allows the use of narrowing iterations 
even on programs where they fail to yield better precision with the classical iteration 
scheme. 

The technique we present in this article combines the idea of implicit representation with 
max-strategy iteration. 



1.3. Contributions. The main contribution of this article is an algorithm that computes 
the strongest inductive invariant of the expanded transition system (which allows higher 
precision for abstract interpretation) without actually constructing it. We shall see later the 
exact deflnition, but here is an interesting particular case (the general result allows more 
complex control flow): given a m x n matrix A, an initial value l G Q" and a transition 
relation r over Q", defined by a formula over variables xi, . . . ,Xn,x'i, . . . ,x'n, built with 
non-strict linear (in)equalities. A, V and prenex 3, compute the least set of the form P{b) = 
{x £ M" I Ax < b} (that is, compute b) containing l and stable by the transition relation r; 
equivalently, find the least loop invariant of the form Ax < b for the loop with initial state 
L and loop body expressed by r. 

Our algorithm can be performed in polynomial space and exponential time. It works 
in a demand-driven fashion: elements from the exponentially-sized sets of strategies and 
loop-free paths are enumerated only as needed, and one can thus hope that they will not 
all be enumerated, which seems to be confirmed by our preliminary experiments. 

We also consider the following associated decision problem, which we shall later make 
more formal: 

"Given a control-flow graph (with N vertices) and transition relations writ- 
ten as existentially quantified first-order linear real arithmetic formulas, a 
family Ai, . . . , A]\f of matrices, an initial control state and a "bad" control 
state b, does there exist vectors bi, . . . ,biy such that Aix < &i A • • • A Ajyx < 
bj\f forms an inductive invariant proving that b is unreachable?". 
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We show this problem to be 5l2-complete (at the second level of the polynomial time hierar- 
chy [501 ch. 17]), even if = 1 and the matrix is 1 x 1. Equivalently, the negated problem 
(abstract reachability of a statement) is shown to be n2-complete. Assuming the polyno- 
mial hierarchy does not collapse, this mean that this problem can be solved in polynomial 
space, but is harder than NP-complete and coNP-complete problems. This clearly justifies 
the use of an exponential-time algorithm. 



1.4. Other related Work. Many approaches have been proposed to address the impre- 
cisions caused by widening operators. We now briefly describe approaches related to ours, 
in addition to those that we directly build upon (Sec. 1.2). Halbwachs et al. [39] proposed 
widening "up to" (an idea resurrected in the Astree system as widening with thresholds 
[HE]), which extracts syntactic hints for limiting widening. Bagnara et al. [HE] proposed 
improvements over the "classical" widenings on linear constraint domains [37]. Gopan and 
Reps [31] introduced "look-ahead widening" |3l] and "guided iterations" [35]: standard 
widening-based analysis is applied to a sequence of syntactic restrictions of the original pro- 
gram, which ultimately converges to the whole program; the idea is to distinguish phases or 
modes of operation in order to make the widening more precise. Some other techniques fully 
do away with widenings jl3[ \TE[ 155] . for instance by expressing the invariants as solutions 
of a mathematical programming problem [36J, and thus the least invariant in the domain 
as an optimal solution to this problem. 

In some cases, it is possible to compute exactly the transitive closure of the transition 
relation, or the application of the transitive closure to given initial states, or at least to 
compute a good over-approximation thereof. Such acceleration techniques [321 [331 US] tend 
to have difficulties dealing with programs where the control flow is not flat (multiple paths 
within the loop body). 

In Section |1.2[ we sketched max-strategy iteration by an analogy to solving games 
where "max" operations correspond to control-flow joins and "min" operations to guards. 
If instead of choosing arguments to "max" operators, the strategy chooses them for "min" 
operators, we obtain min-strategy iteration [141124] . Min-strategy iteration solves a sequence 
of fixpoint problems with decreasing values always weaker or equivalent to the strongest 
inductive invariant in the domain. In general, this sequence does not necessarily converge to 
this least inductive invariant, but it does so under certain conditions (e.g., when all abstract 
transformers are non-expansive [I] ) . We investigated applying our "implicit representation" 
idea to the min-strategy approach, but encountered a stumbling block: while it is possible to 
decide whether a max-strategy is improvable using SMT solving on quantifier-free formulas, 
the equivalent for min-strategies necessitated quantified formulas, which defeats the purpose 
of doing away with quantifier elimination techniques. 



2. Basics 



2.1. Notations. B = {0, 1} denotes the set of Boolean values. The set of real numbers 
(resp. the set of rational numbers) is denoted by M (resp. Q). The complete linearly ordered 
set MU{— oo, oo} is denoted by M, similarly Qu{— oo, oo} is denoted by Q. For any expression 
(resp. term) e, we write e[ei/xi, . . . , Cfc/x^] to denote the expression (resp. term) that is 
obtained from e by simultaneously replacing all occurrences of the variables xi, . . . , by 



ei, ...,ek. 
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A partially ordered set D is called a lattice if and only if any two elements x, ?/ G D 
have a greatest lower bound and a least upper bound, denoted respectively by x A y and 
xy y. It is a complete lattice if and only if any subset X C D has a greatest lower bound 
and a least upper bound, denoted hy f\X and V X. The least element \l % oi a. complete 
lattice is denoted by _L. The greatest element /\0 is denoted by T. 

Assume that Di and D2 are partially ordered by <i and <2, respectively. A function 
/ : Di — )• D2 is called monotone if and only if f{x) <2 f{y) for all x,y E Di with x <i y. 
We shall often use the following fundamental result: 

Theorem 2.1 (Knaster/Tarski IHl])' Let^]> he a complete lattice and / : D — )• D monotone. 
The operator f has a least fixpoint and a greatest fixpoint, respectively denoted by and 
uf . Moreover, we have fif = /\{x £ D | f{x) < x} and vf = \/{x £D \ x < f{x)}. O 

We denote the transpose of a matrix A by . For x G M, we denote the column vector 
(x, . . . , x)"'" by X. We denote the i-th row (resp. the j-th column) of a matrix A by Ai. (resp. 
A.j). Accordingly, Ai.j denotes the entry in the i-th row and the j-th column. We also use 
this notation for vectors and mappings / : X — )■ Y^, i.e., for all i € {1, • • • , k}, the mapping 
/j. : X — )• y is given by /j. (x) = (/(x))i. for all x £ X. The set M is partially ordered by 
the component- wise extension of <, which we again denote by <. That is, for all x, y S M , 
X < y if and only if Xj. < yj. for all i E {1, . . . , n}. 

A mapping / : M — t- M is called affine if and only if there exist A G M™^" and 6 € M 
such that /(x) = Ax + b for all x G M . Here, we use the convention —00 + 00 = —00. 
Observe that / is monotone if all entries of A are non-negative. A mapping / : M — )• M 
is called weak-affine if and only if there exist a S M" and 6 G M such that /(x) = a^x + b 
for all X G M with /(x) 7^ —00. A mapping / : M — )• M is called weak-affine if and 
only if there exist weak-affine mappings /i, ...,/„,: M — )• M such that / = (/i, . . . , fm)~^- 
Every affine mapping is weak-affine, but not vice-versa. In this article, we are concerned 
with mappings that are point-wise minimums of finitely many monotone and weak-affine 
mappings. Note that these mappings are in particular concave, i.e., the set of points below 
the graph of the function is convex. 

2.2. Linear Programming. Linear programming aims at optimizing a linear objective 
function with respect to linear constraints. In this article, we consider linear programming 
problems (LP problems for short) of the form sup {c~^ x \ x € M", Ax < b}. Here, A G M™^", 
b G M™, and c G M" are the inputs. The convex closed polyhedron {x G M" | Ax < b} is 
called the feasible space. The LP problem is called infeasible if and only if the feasible space 
is empty. An element of the feasible space, is called feasible solution. A feasible solution x 
that maximizes c~^ x is called optimal solution. 

If A and b consist of rational entries, only, then the feasible space is nonempty if and 
only if it contains a rational point. An optimal solution exists if and only if there exists a 
rational one. In this article, we always assume that all numbers in the input are rational. 

LP problems can be solved in polynomial time through the ellipsoid method [41j and 
interior point methods [57]. However, the running-time of these algorithms crucially depends 
on the sizes of occurring numbers. At the danger of an exponential running-time in contrived 
cases, we can also instead rely on the simplex algorithm: its worst-case running-time does 
not depend on the sizes of occurring numbers (given that arithmetic operations, comparison, 
storage and retrieval for numbers are counted for 0(1)). See for example Dantzig |2U| . 
Schrijver [57j for more information on linear programming. 
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2.3. SAT modulo linear real arithmetic. The set of SAT modulo linear real arithmetic 
formulas $ is defined through the following grammar: 

e ::= c I X I ei + 62 I c • e' <^ ::= a | ei < 62 | ei < 62 | $i V <1>2 I 'J'l A <1>2 | (2.1) 

Here, c G Q is a constant, a; is a real valued variable, e,e', 61,62 are real-valued linear 
expressions, a is a Boolean variable and <I>, <!>', <I>i, <I>2 are formulas. An interpretation I 
for a formula $ is a mapping that assigns a real value to every real-valued variable and a 
Boolean value to every Boolean variable. We write / |= $ for "/ is a model of . That is, 
we firstly inductively define a function le} that evaluates a linear expression e as follows: 

[cl/ = c Ml = Iix) [ei + 62l/=[eil/ + [62l/ [c.6l/ = c-[6ll (2.2) 

Secondly, we inductively define the relation |= as follows: 

Iha ^ /(a) = 1 I 1= ei < 62 ^ leijl < [62I/ 

I H ei < 62 ^ leijl < le2}I / |= ^.^ V $2 ^ / ^ $1 or I |= $2 (2.3) 

/ ^ $1 A $2 / N ^1 and / 1= «>2 / 1= I 

A formula is called satisfiable if and only if it has at least one model. A formula has a model 
if and only if it has a rational model. 

The problem of deciding the satisfiability of SAT modulo linear real arithmetic formulas 
is NP-complete. There nevertheless exist efficient solver implementations for this decision 
problem, generally based on the DPLL(T) approach, an extension of the DPLL algorithm 
for SAT to richer logics. For more information see for example Biere et al. [7], [Dutertre 



and de Moura [22], and Kroening and Strichman |42| . Such implementations, on satisfiable 



instances, can provide a model over Booleans and rational numbers. 

In order to simplify notations we also allow matrices, vectors, the relations >, >, 7^, =, 
and the Boolean constants and 1 to occur in SAT modulo linear real arithmetic formulas. 



3. The Framework 



3.1. Control Flow Graphs and Collecting Semantics. In this article, we model pro- 
grams as control fiow graphs, i.e., a program G is a triple {N,E,st), where 

(1) is a finite set of program points, 

(2) E <^ N X Stmt X is a finite set of control-flow edges, and 

(3) st € is the start program point. 

A program uses n real- valued variables , . . . , . A state is described by a vector x G M". 
We assign a collecting semantics [sj : 2^" — t- 2^*" to each statement s G Stmt. The 
collecting semantics Is} is an operator that assigns a set [[s]](A) of possible states after 
the execution of s to a set X of possible states before the execution of s. The set Stmt 
of statements is specified subsequently. The collecting semantics ^ of a program G = 
{N, E, st) is finally defined as the least solution of the following constraint system: 

V[st] D M" Y[v] D [sl(V[n]) for all {u,s,v) G E. (3.1) 

Here, for any v ^ N, the variable V[?;] takes values in 2^*". The components of the collecting 
semantics V are denoted by V[v] for all v G N. Throughout this article, we will usually 
denote variables in bold face, and values in normal face. 
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3.2. Statements. The set Stmt of all statements is the set of all SAT modulo linear real 
arithmetic formulas without Boolean variables and without negation. Note that non-strict 
and strict inequality constraints are permitted. The formula ei 7^ 62 is also permitted, since 
it is an abbreviation for ei < 62 V 62 < ei. We can (in linear time) transform any SAT 
modulo linear real arithmetic formula without Boolean variables into this form by pushing 
negations to the leaves. 

The M- valued variables xi, . . . , x„ and x'^, . . . , x^, that may occur in the formula, play 
a particular role. The values of the variables xi, . . . , x„ represent the values of the program 
variables before executing the statement, and the values of the variables x'^, . . . ,x^ repre- 
sent the values of the program variables after executing the statement. For convenience, 
we denote the vectors (xi, . . . ,x„)^ and (x'^, . . . jX^)""" also by x and x', respectively. In 
addition to xi, . . . , x„ and x'^, . . . , x^, the statement may also include other variables, which 
may stand for intermediate values computed (or non-deterministically chosen) during the 
execution of a program statement. Conceptually, these variables are existentially quantified. 

We could also add Boolean variables, at the expense of some additional complexity in 
definitions, theorems and proofs. Note that this would not increase the expressiveness, since 
a Boolean variable y can be simulated by a real variable y by replacing all occurrences of y 
by y = 1, all occurrences of -ly by y = 0, and conjoining (y = OVy = 1) to the formula. In 
practice, the direct support of Boolean variables may be beneficial for the efficiency. More 
generally, we can accommodate any formula feature that just expresses disjunctions in a 
compact way; the only requirement is not to generate negations. 

The collecting semantics [[sj : 2^*" — )• 2'*" of a statement s S Stmt is defined by 

lsj{X) := {x' e M" I 3j; G X . s[x/x, x'/x'] is satisfiable} for all X C M". (3.2) 

Consider the following C-code snippet: 

if (x_l >= 0) 

x_2 = x_l ; 
else 

x_2 = — x_l ; 

Assume that x_l and x_2 are of type int and that they are the only numerical variables. 
The effect of the C code snippet can be abstracted by the statement 

x'l = xi A ((xi > A X2 = xi) V (xi < A Xg = -xi)) (3.3) 

Note that a conjunct x^ = Xj is needed for all variables that do not change their values. 

A statement s is called merge- simple if and only if it is in disjunctive normal form, 
i.e., s is of the form si V • • • V s^, where the statements si, . . . , do not use the Boolean 
connector V. Any statement can be rewritten into an equivalent merge-simple statement 
in exponential time and space using distributivity. The crux of our main result is that our 
algorithm never needs to compute such an exponentially-sized disjunctive normal form. 



If we convert Statement (3.3) into an equivalent merge-simple statement using distribu- 
tivity, we get: 

(x'l = xi A xi > A X2 = xi) V (x'l = xi A xi < A X2 = -xi) (3.4) 

A merge-simple statement s that does not use the Boolean connector V at all is called 
sequential. Intuitively, sequential statements correspond to straight-line sequences of basic 
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blocks. The merge-simple statement (3.4) non-deterministically chooses between executing 
one of the following sequential statement: 

x'^ = xi A xi > A X2 = xi 



x'^ = xi A xi < A X2 = — xi 



(3.5) 



3.3. Abstract Semantics. Let D be a complete lattice (for instance the complete lattice 
of all n-dimensional closed real intervals). Assume that a : 2^" — )• D and 7:0—7- 2^^^" form 
a Galois connection, i.e., for all X C M" and all d E D, a(A) < d if and only if A < 7(d). 
The abstract semantics [s]]'' : D — t- D of a statement s is then defined by 

H«:= 00^07. (3.6) 

Remark that we have chosen to use the best abstract transformer, i.e., the most precise 
abstract semantics. All that was needed for soundness is that Is} o -y^d) C 7 o |{sl|''(d) for all 
d G D. Our choice of [[sl|''(d), however, is the most accurate sound value. 

The abstract semantics of a program G = (A, E, st) is the least solution of the 
following constraint system: 

V*[st] > a(M") > [sl"(V"M) for all (u, s,v) & E (3.7) 

Here, for any v G N, the variable \'^[v] takes values in D. The components of the abstract 
semantics are denoted by V'^[v] for all v ^ N. The abstraction is sound, i.e., the abstract 
semantics V"^ safely over-approximates the collecting semantics V, i.e., 7(l^''[v]) 5 V[v] for 
all V e N. 



3.4. Template Linear Constraints. In this article we restrict our considerations to tem- 
plate linear constraint domains as introduced by Sankaranarayanan et al. | .56j . We assume 
that a template constraint matrix T £ ]^"^x" jg given. For technical convenience, we always 
assume w.l.o.g. that m > 1 and each row of T contains at least one non-zero entry. The 
template linear constraint domain can be identified with the set M . As shown by Sankara- 
narayanan et al. [56], the abstraction a : 2^" — t- M and the concretization 7 : M — )■ 2'^", 
which are defined by 

7(d) := {x G R" I Tx < d} for aU d £ l"", and (3.8) 

a{X) := /\{d G D | 7(d) D A} for aU A C R", (3.9) 

form a Galois connection. 

The template linear constraint domains contain intervals, zones, and octagons |45| H6] , 
with appropriate choices of the template constraint matrix T |56j . For instance, if we have 
two variables x and y, and we abstract each variable by an interval as x G [—lx,Ux] and 
y G [—ly,Uy], the vector d is formed of {lx,ly,Ux,Uy). Here, the matrix T is given by: 



T = 



/-I 







-1 


1 







1/ 
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and thus the concretization expresses: 

-lx,ux], y G [- 



7 



y 

Ux 

VyJ 



X £ 



y, Uy 





-1 













-1 




ly 




1 







Ux 


\ 





1^ 







While intervals, zones, and octagons are somewhat "obvious" choices, a common discussion 
with respect to template domains is how to find the templates, as opposed to the domain 
of convex polyhedra, where the convex hull and widening operations somewhat "discover" 
interesting directions in space. In this article, we shall assume that template matrices are 
given and refrain from discussing how they were obtained. 



4. Improving the Precision of the Abstraction 
Most abstract interpretation techniques consider a control-flow graph with transitions ex- 



pressed as sequential statements only (see formal definition in Sec. 3.2), that is, composed 
of atomic guards and assignments. An if-then-else construct with simple constructs (e.g., 
assignments) in both branches is thus expressed as two sequential statements, and a se- 
quence of two such if-then-else constructs (one from point A to point B and one from B 
to C) is expressed as on the left of Figure [2j two sequential statements between A and B, 



and two sequential statements between B and C. As noted in the introduction (Sec. 1.2), 
abstract interpretation techniques usually abstract the set of reachable states at point B. 
This may result in spurious states being considered in the abstraction, which in turn may 
result in the analysis tool being unable to prove desirable properties. 

In this article, we apply an idea that is very similar to the path focusing technique 
of Monniaux and Gonnord [l9]. Given a program expressed as a control- flow graph with 
sequential statements on the edges, we first compute a feedback vertex set (a.k.a. cut-set) 
S, that is, a set of control nodes (the feedback vertexes) such that removing them cuts all 
cycles in the graph. Our original program is equivalent to a program where the only control 
nodes are those in the feedback vertex set, but edges carry arbitrary statements instead of 



sequential statements only (cf. Sec. 3.2 ). The results of program analyses on this new graph, 
at nodes from the feedback vertex set 5, are sound invariants for the original program. If 
information is needed at other nodes, we can compute it from the information we have for 
the nodes from S. 

Since methods for obtaining compact formulas expressing these statements from the 
original program have already been described in other publications [H], we do not explain 
them in detail. Instead, we provide an example. 

Example 4.1 (Running Example). Throughout this article we use the following C-code 
snippet as a running example: 

int x_l , x_2 ; 

x_l = 0; 

while (x_l <= 1000) { 
x_2 = — x_l ; 

if (x_2 < 0) x_l = -2 * x_l ; 
else x_l = — x_l + 1; } 
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-2X1 



X2 < -1 



St; 

x'l = 

<1> 

Xl < 1000 



-Xl 



x[ = -Xl + 1 



(5) 



X2 > 



Figure 3: The program Gi of the running example 

Tliis C-code snippet is abstracted through the program Gi = {Ni, Ei, st) depict in Figurejsj 
However, it is not necessary to apply abstraction at every program point, i.e., to assign an 
abstract value to each program point. It suffices to apply abstraction at a vertex feedback 
set of Gi. Since all loops contain the program point 1, {1} is a feedback vertex set of Gi. 
Equivalently to applying abstraction only at program point 1, we can rewrite the control- 
flow graph Gi into a control-flow graph G = (N, E, st) that is equivalent w.r.t. the collecting 
semantic, but contains just the program point st and the program points from the vertex 
feedback set {!}. The result of this transformation — the control- flow graph G — is shown 
in Figure Qa) (Page 18). 

The programs Gi and G are equivalent w.r.t. their collecting semantics, i.e., V[v] = Vi[v] 
for all t; G A^. Here, Vi denotes the collecting semantics of Gi and V denotes the collecting 
semantics of G. W.r.t. to the abstract semantics, G is usually more precise than Gi, because 
we reduced the number of merge points. In general, we only have V^[v] C V^[v] for all v e N, 
where Vf denotes the abstract semantics of Gi and denotes the abstract semantics of 
G. This is independent of the abstract domain]^ □ 

Let us make a few last remarks regarding the feedback vertex set. Abstract interpre- 
tation techniques usually use such a set to select widening points |16i §4.1.2]. In contrast, 
our method uses this set to select the nodes where it over-approximates the set of reachable 
states; it does not over-approximate the set of reachable states at other nodes; widening is 
not involved at all. Finding a feedback vertex set of minimal cardinality is an NP-complete 
problem if the control-flow graph is arbitrary; such a set can however be found in linear 
time if the control- flow graph is reducible (in short, if loops have a single entry point) [59j, 
which is the case for control-flow graphs directly obtained from structured programs (the 
method extends to certain irreducible graphs). The control-flow graph may however be- 
come irreducible if certain optimizations or partitioning techniques are used. A common 
heuristic is, for structured programs, to use loop headers, and for unstructured programs 
to use the targets of back edges from a depth-first traversal \10\ lllj: this heuristic does not 
guarantee that the feedback vertex set is minimal with respect to inclusion ordering, let 
alone cardinality. 



We assume that we have given a Galois-connection and thus in particular monotone best abstract 
transformers. 
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5. Basic Observations 
We now note down basic properties of the abstract semantics. 

5.1. Abstract Semantics of Statements. Our first observation is that, for all sequential 
statements s and all d G M , |Is]l^((i) can be computed efficiently. 

Lemma 5.1 (Sequential Statements). Let s be a sequential statement and d G M"*. The 
operator \s\^ is a point-wise minimum of finitely many monotone and weak-affine operators. 
For all d G 1"", lsf{d) can he computed in polynomial time through linear programming. 

Proof. Let z G {1, . . . , m}. We get: 

H«.(d)=sup{r,.x'|x'GH(7(d))} (5.1) 

= sup {Ti.x I x' G M" and 3x . Tx < d and s[x/x, x'/x'] is satisfiable} (5.2) 



Equation 5.2 follows from Equation 5.1 by expansion of the concrete semantics \s\ into a 



SMT-formula and of ^{d) into Tx < d. Since s does not contain disjunctions, the optimiza- 



tion problem in (5.2) aims at optimizing a linear objective function w.r.t. linear constraints 
(equalities, strict inequalities, and non-strict inequalities). The optimal value of this opti- 
mization problem can be computed in polynomial time through linear programming. To 
check feasibility by standard linear programming techniques (which only allow non-strict 
inequalities), we can replace every strict inequality ei < 62 by the non-strict inequality 
ei < 62 — e, where e is appropriately small. Such an appropriately small e can be com- 
puted in polynomial time. Provided that the optimization problem is feasible, we can then 
replace s[x/x, x'/x.'] by s[x/x, x'/x'][</<]. Here, s[</<] denotes the statement obtained 
from s by replacing every strict inequality relation by a non-strict inequality relation. The 
optimal value of the obtained linear programming problem is equal to the optimal value of 



the optimization problem (5.2) 



It remains to show that [sj'. is a point-wise minimum of finitely many monotone and 
weak-affine operators. Since s[x/x, x'/x'][</<] is a conjunction of non-strict linear in- 
equalities, there exist matrices A, A' and A" and a vector b such that, for all x and x' , 
s[x/x, x' /x'][</<] is satisfiable if and only if there exists ax" such that Ax+A'x'+A"x" < b 
(the vector x" stands for the other variables in s, which are implicitly existentially quanti- 



fied). Thus, the optimization problem (5.2) can be rewritten as follows: 



lsjl{d) = sup {Ti.x' \x'm'',3x£R'' .3x"eR'' .Tx < d and Ax + A'x + A"x" < b} (5.3) 

Strong duality [12], also known as Farkas' lemma, thus gives us, provided that [[sl|j.(d) > 
—00, i.e., the optimization problem is feasible, the following equation: 

lsf.{d) = inf {d^yi+ftTy^ I yi,y2 > 0, T^yi+A^ya = 0, A"^ y^ = 0, A'^y2 = T^} (5.4) 



Since yi > for all feasible solutions of the linear programming problem in (5.4), Is} -, 
coincides with a point- wise infimum of monotone and affine operators on the set {d € 
I [[s]]-. (c?) > — oo}. That is, [s]]^. is a point-wise infimum of monotone and weak-affine 
operators. Since the optimal value, provided that it exists, is attained at the vertices of the 
feasible space (finitely many), the point-wise infimum is a point-wise minimum of finitely 
many monotone and weak-affine operators. □ 
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The max-strategy improvement algorithm we adapt in this article heavily relies on the 
fact that, for all sequential statements s, [sj^ is a point-wise minimum of finitely many 
monotone and weak-afhne operators. The latter statement especially implies that [sj" is 
concave (see Gawlitza and Seidl |29] for precise definitions). 



The number of vertices in the feasible space of the point-wise infimum in (5.4) may be 
exponential in the size of the original problem, and thus the representation as a point-wise 
minimum of finitely many monotone and weak-affine operators might contain an exponential 
number of such operators. This is not a problem since our algorithm never computes this 
decomposition explicitly. 

Any polynomial-time method for evaluating the abstract semantics of sequential state- 
ments can be used to derive a polynomial-time method for evaluating merge-simple state- 
ments. 

Lemma 5.2 (Merge-Simple Statements). Let s be a merge-simple statement. The operator 
[sj^ is a point-wise maximum of finitely many point-wise minima of finitely many monotone 
and weak-affine mappings. For all d G M", lsp{d) can be computed in polynomial time 
through linear programming. 

Proof. Let s = si V • • • V Sfc, w here si, . . . ,Sk are sequential statements. Since [[s]]'*(d) = 
[[si]]''((i) V • • • V [[sfe]]'*(d), Lemma 5.1 , can be applied to provide us with the desired result. □ 



The problem for arbitrary statement is more difficult. By clear equivalence with satisfiability 
solving modulo the theory of linear real arithmetic, we obtain: 

Lemma 5.3. The problem of deciding, whether or not, for a given template constraint 
matrix T, and a given statement s, lsp{oo) > — oo holds, is NP-complete. □ 



5.2. A Trivial Method for Computing Abstract Semantics. Using the results we 
have obtained so far, the abstract semantics of a program G w.r.t. some template constraint 
matrix T can be computed using the following two-step procedure: 

(1) Replace each statement s of the program G with an equivalent merge-simple statement. 
This corresponds to an explicit enumeration of all paths between cut-points, which 
potentially causes an exponential blowup. 

(2) Apply the methods of Gawlitza and Seidl ^26j to the obtained program to compute the 
abstract semantics V'^ of G. 

Because of the possible exponential blowup, the above described method is impractical 
for most case^ Our method eschews this blowup as follows: instead of enumerating all 
program paths, we shall visit them only as needed. Guided by a SAT modulo linear real 
arithmetic solver, our method selects a path through a statement s only when it is locally 
profitable in some sense. In the worst case, an exponential number of paths may be visited 



(Section 7.3); but one can hope that this rarely happens in practice. In cases in which our 
algorithm needs exponential time, it at least avoids the explicit exponential expansions. It 
uses only polynomial space. 



Note that we cannot expect a polynomial-time algorithm, because of Lemma 5.3 even without loops, 



abstract reachability is NP-hard. Even if all statements are merge-simple, we cannot expect a polynomial- 
time algorithm, since the problem of computing the winning regions of parity games is polynomial-time 
reducible to abstract reachability |27| . 
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6. Max-Strategy Iteration 



This section presents our main contribution. We adapt the max-strategy improvement 
schema of Gawlitza and Seidl [28] to obtain an algorithm to compute abstract semantics in 
the framework of this article. 



6.1. Notations. Before we go in medias res, we have to introduce some notations. A 
system £ of (fixpoint) equations over M is a finite set {xi = ei, . . . , x„ = e^} of equations. 
Here, xi, . . . , x„ are pairwise distinct, M- valued variables and ei, . . . , are expressions over 
M. We denote the set {xi, . . . , x„} of variables of S by X^-. We omit the subscript, whenever 
it is clear from the context. A function /) : X — )• M is called a variable assignment. It assigns 
the value /j(x) to each variable x G X. Variable assignments are ordered by the point-wise 
extension of < on M, i.e., p < p' if and only if /o(x) < p'(x) for all x S X. Since M is 
a complete linearly ordered set, the set X — t- M of all variable assignments is a complete 
lattice. The semantics \e\ : (X — )• M) — t- M of an expression e is defined by [[x]](p) := /o(x) 
and [[/(ei, . . . ,efc)Kp) := /([[eiKp), • • • ,|efcl(p)), where x G X, / is a A;-ary operator on M, 
ei, . . . , Bfc are expressions, and p : X — t- M is a variable assignment. We define the operator 
[[£:]]: (X ^ 1) ^ X ^ 1 by [[<?l(p)(x) := [[e]]p for all equations x = e of all p : X ^ 1, 
and all X G X. A fixpoint equation x = e is called monotone if and only if all operators 
used in e are monotone. Then, the evaluation function [ej of e is monotone, too. Finally, 
the operator \£.^ is monotone for all systems £ of monotone (fixpoint) equations. A variable 
assignment p is called a solution (resp. pre- solution., resp. post-solution) of £ if and only if 
p = [[<?I|(p) (resp. p < [[<fKp); resp. p > [[<?]] (p))- The least solution oi£ is denoted by n\£\. If 
the operator \£'^ is monotone, then the fixpoint theorem of Knaster/Tarski (Theorem |2.1[ ) 
ensures the existence of a uniquely determined least solution p[[<S]]. For a system £ of 
equations and a pre-solution p, p>p[[<fl| denotes the least solution of £ among the solutions 
of £ that are greater than or equal to p, i.e., /i>p[[<?I| = min{p' | p' = [[<?]l(p') and p' > p}. 
Again, if the operator \£^ is monotone, then the fixpoint theorem of Knaster/Tarski ensures 
the existence of p>p[[(?I|, since the set {p' | p' > p} is a complete lattice. 



6.2. Rewriting the Abstract Semantic Equations. The first step of our method con- 
sists of rewriting our static analysis problem into a system of monotone fixpoint equations 
over M. Assume that G = {N, E, st) is a program that has n variables, and T G jg^x" ig a 
template constraint matrix. Recall that (w.r.t. T) the abstract semantics of G is the least 



solution of the following constraint system (cf. ( |3.7[ ) in Subsection 3.3): 

V«[st] > a(M") V»H > [sltt(Vtt[u]) for all {u,s,v) G E (6.1) 

The constraint system has exactly one M'"-valued variable \^^[v] for each program point 
V ^ N. For each program point v ^ N, we decompose the M -valued variable \'^[v] into m 
M- valued variables d^^i, . . . , d^^^- That is, we set (d^,^!, . . . , d^^^)"^ = V^[?;]. We obtain the 
following constraint system: 

dst,i > oo for alH G {1, . . . , m} (6-2) 

dv,i > {sjl {du,i, . . . , du,m) for aU (u, s,v) e E and ah i G {1, . . . , m} (6.3) 
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G 


= {N,E,st) 




N 


= {st,l} 




E 


= {(st,x; = 0,1), 


(1,^,1)} 


s 


= ^> A (^-i V ^>2) 




$ 


= xi < 1000 A x'2 


= -xi 




= x'2 < -1 Ax'^ = 


-2x1 


$2 


= -x'2 < A x'l = 


-xi + 1 




(a) The program G 



T 



1 

-1 



(b) The template constraint matrix T 
(only xi is taken into account in the template, thus the zero right column) 



dst,i = 00 
dst,2 = 00 



di,i = max {[x'l = 0fi.(dst,i,dst,2), H5.(di,i, di,2)} 
di,2 = max {[x'l = 0l|.(dst,i,dst,2), H2.(di,i, di,2)} 
(c) The equation system <S(G, T) 
Figure 4: The running example 



The fixpoint theorem of Knaster/Tarski (Theorem 2.1) ensures that the least solution of 
the above system of inequalities is the least solution of the following equation system: 

dst,i = oo for alH G {1, . . . , m} (6-4) 



:max 



{H?.(d„,i,...,d„,™) I {u,s,v) G for all^; G iV\{st},i G {l,...,m} (6.5) 

We denote the above system of fixpoint equations by £{G, T). From Section[5| we know that 
the right-hand sides of £{G,T) are point-wise maxima of finitely many point-wise minima 
of finitely many weak-affine operators. We summarize the properties of £{G,T): 

Lemma 6.1. Let G be a program and its abstract semantics (w.r.t. the template con- 
straint matrix T G M™^";. Let p* := plS{G,T)j be the least solution of £{G,T). Then 
V^lv] = p*{dy^i) for all program points v ^ N and all i G {1, . . . ,m}. The right-hand sides 
of£{G, T) are point-wise maxima of finitely many point-wise minima of finitely many weak- 
affine operators. Thus, they are in particular point-wise maxima of finitely many monotone 
and concave functions. D 

Examples 6.2. We again consider our running example specified in Figure Qa) . We want 
to perform the analysis w.r.t. the template constraint matrix T specified in Figure |4]j^b) . 
The resulting equation system £{G,T) is shown in Figure Qc). 

The least solution p* := pl£{G,T)} of £{G,T) i s gi ven by p* = {dst,i 1— ?• oo,dst,2 ^ 

V^[st] = (00,00), and 0[1] = 



6.1 



oo,di,i ^ 2001, di,2 ^ 2000}. Thus, by Lemma 
(2001,2000). In consequence, all possible values of the program variable xi at program 
point 1 are in the interval [-2000, 2001]. □ 
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6.3. Adapting the Max-Strategy Improvement Algorithm. Following the lines of 
Gawlitza and Seidl [29], our starting point is a system £ of monotone fixpoint equations of 
the form x = max Ex, where x is a M- valued variable, and Sx is a finite set of monotone and 
concave expressions over M. An expression e is called monotone (resp. concave) if and only 
if [ej is monotone (resp. concave) We treat a function from the finite set X of variables 
to M as a vector of |X| elements from M. In our application — recall that we aim at solving 
the equation system £{G, T) — the sets Ex are implicitly and succinctly given by the right- 
hand sides of equations of the forms (6.4) and (6.5). Indeed, every expression of the form 
[sjj. (d^^i, . . . ,du^m), found on the right-hand side of such equations, can be equivalently 

rewritten into max{|lsil|f.(d„,i, . . . , d„,m), • • • , [[sfclf.(d«,i, • • • , d„,m)}, where si,...,Sk are 
(potentially exponentially many) sequential statements. Since si, . . . , are sequential, the 
operators [sijf., . . . , [sfc]] •. are point- wise minima of finitely many monotone and weak-affine 
operators; hence they are monotone and concave operators. 

One obvious way to solve the system £ of equations is to perform the above mentioned 
rewriting explicitly and then apply the max-strategy improvement algorithm. To avoid 
this impractical exponential blowup, in what follows we modify the algorithm such that it 
directly works on the succinct representation. 

Assume that £ denotes a system of fixpoint equations of the form x = max Sx, where 
Sx is a finite set of monotone and concave expressions over M. A max-strategy a for £ is a 
system of equations such that, for each equation x = e of a, one of the following statements 
holds: 

(1) e is — oo. 

(2) e S Sx, where x = max Sx is an equation of £. 

Intuitively, a max-strategy picks for each maximum operator one of its operands. For a 
system £ of equations, we denote the set of all max-strategies by S^-. In our application, 
the cardinality of is exponential in the size of £. To be more precise, it is in 0(2" ), 
where n denotes the size of £. Enumerating all max-strategies is therefore impractical. 

Examples 6.3. We continue our running example (Figure[4]). Consider the system £{G, T) 
and note that s = ^ A {^i \/ ^2) = A <I>i) V (<I> A <I>2); therefore the equation 

di,i = max {[x'l = 0l«.(dst,i,dst,2), ^[(di.i, di,2)} (6.6) 

can be equivalently rewritten into 

di,i = max{[x; = 0l|.(dst,i,dst,2), [<1> A <I>il[(di,i, di,2), A $2l"i.(di,i, di,2)} . (6.7) 

Recall that this expansion is solely for the purpose of proving properties: it is not done in 
the algorithm. The equation system a consisting of the equations 

dst,i=oo di,i = A$2l!.(di,i,di,2) d,t,2 = oo di,2 = [x'l =0l«.(dst,i,dst,2) (6.8) 
is thus a max-strategy for this system. □ 



'^For a precise definition of concavity for functions from the set R" — >■ R'" , we refer to Gawlitza and Seidl 
|31| . For this article, however, a precise treatment of these issues is not required. We just mention concavity 
to give a better intuition. 
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A crucial notion we need in the following is the notion of improvements. Let cr be a max- 
strategy for £ and p a pre-solution of o". A max-strategy a' for £ is called an improvement 
of a w.r.t. p if and only if the following conditions are fulfilled: 

(1) If (p), then la'j{p)>p. 

(2) If X = e is an equation of a and x = e' is an equation of a' with e / e', then [[e']](/9) > 

Mip). 

Example 6.4. We continue our running example (Figure |4]). We consider the equation 
system a' that consists of the following equations: 

dst,i=oo di,i = [$A$2fi.(di,i,di,2) dst,2 = oo di,2 = [$A$il^.(di,i,di,2) (6.9) 

The equation system a' is a max-strategy of £{G,T) and moreover an improvement of the 
max-strategy a (defined in Example 6.3) w.r.t. the variable assignment 

p := {dst,i i-> oo, dst,2 ^ oo, di,i i-^ 1, di,2 ^ 0}. (6.10) 

It is an improvement, since A $il«.(di,i, di,2)l(p) = 1 > = ix^ = 0]^ (di,i, di,2)Kp)- 
In this example, a' is the only improvement of a w.r.t. p. D 

Note that, for a max-strategy a and a pre-solution p of o", there might be several max- 
strategies a' that are improvements of a w.r.t. p. Consider, for instance, the equation 
system £ = {x = max{0, 1, 2}}. Both, the max-strategies {x = 1} and {x = 2} are 
improvements of the max-strategy {x = 0}. For the results we are going to develop in this 
article, it is not important which improvement we choose: this will neither affect the final 
result obtained, nor change the worst-case complexity bounds that we prove. It is however 
possible that different heuristics may lead to different practical complexities. 

The max-strategy improvement algorithm starts with the max-strategy do := {x = 
—oo I X G X} and the variable assignment po := {x i— )• —oo \ x G X}. The algorithm 
successively performs the following two steps in the given order until it has found the least 
solution: 

(1) Improve the max-strategy a w.r.t. p. 

(2) Evaluate the max-strategy a w.r.t. p to obtain a new value for p. 
In pseudo-code, we can formulate it as follows: 



Algorithm 1 The Max-Strategy Improvement Algorithm 

1 : o- ^ do; 

2 : p^ Po; 

3: while (p< (p)){ 

4 : a -^r- improvement of a w.r.t. p; 

5: p^p>plaj; 

6: } 

7 : return p; 



For all z G N, let pi be the value of the variable p and cTj be the value of the variable a after 
the i-th evaluation of the loop-body. We have: 

Lemma 6.5 (|31j. \28\ Lem. 6.7]). The following statements hold for all i G N.' 

(1) Pr<m- 
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(2) Pi < lai+lj{p^). 

(3) If Pi < plSj, then pi+i > pi. 

(4) If Pi = pIB\, then pi+i = pi. □ 

The above lemma implies that the algorithm returns the least solution, whenever it ter- 
minates. Whether or not it terminates depends on the properties of the class of fixpoint 
equation systems under consideration. In our application, we aim at computing the least 
solution of the equation system <S(G, T) (see Subsection 6.2 ). By Lemma 6.1 the right-hand 
sides of £{G,T) are point-wise maxima of finitely many monotone and concave functions. 
More specifically, the right-hand sides are point-wise maxima of finitely many point-wise 
minima of finitely many weak-affine operators. This property guaranties the termination 
of the max-strategy improvement algorithm |31j|281 §6.1]. At the latest, it terminates after 
considering each max-strategy at most linearly often (see Lemma 6.8). Before we explain 
the remaining building blocks, i.e., how to execute program lines 4 and 5, we consider an 
example. 

Example 6.6. We consider our running example. That is, we aim at computing the least 
solution of the equation system £{G, T) shown in Figure |4j Running the algorithm can, for 
instance, give us the following trace: 

o-Q := {dst,i = -oo, dst,2 = -oo, di,i = -oo, di_2 = -oo} (6.11) 
po := {dst,i ^ -oo, dst,2 i-> -oo, di^i i-> -oo, di 2 ^ -oo} (6.12) 



0-1 := {dst,i = oo, dst,2 = oo, di,i = -oo, di,2 = -oo} 
Pi ■= {dst,i oo, dst,2 oo, di,i ^ -oo, di,2 ^ -oo} 



0-2 := {dst,i = oo, dst,2 = oo, di,i = lx.[ = 0]l5.(dst,i, dst,2), (6.15) 

di,2 = K = 0l|.(dst,i,dst,2)} (6.16) 

P2 ■= {dst,i ^ oo, dst,2 ^ oo, di,i 1-^ 0, di,2 ^ 0} (6-17) 

:= {dst,i = oo, dst,2 = oo, di,i = A $2li.(di,i,di,2), (6.18) 

di,2 = K = 0li(d,t,i,dst,2)} (6.19) 

P3 ■= {dst,i ^ oo, dst,2 ^ oo, di_i h-^ 1, di,2 ^ 0} (6.20) 

CT4 := {dst,i = oo, dst,2 = oo di,i = A $2l?.(di,i, di,2), (6.21) 

di,2 = [^A$ilt(di,i,di,2)} (6.22) 

P4 := {dst,i ^ oo, dst,2 ^ oo, di,i ^ 2001, di,2 ^ 2000} (6.23) 

Here, for all i, pi+i = /x>p. [di+i]] and o"i+i is an improvement of ai w.r.t. pi. The variable 
P4 is a solution of £{G,T). The max-strategy improvement algorithm terminates with the 

correct least solution, which is p^. □ 

We now present methods to evaluate max-strategies (Line 5 of Algorithm [T]) and to improve 
max-strategies (Line 4 of Algorithm [T| . 



6.13) 
6.14) 



6.4. Evaluating Max- Strategies. We restrict our consideration to our application. That 
is, we assume that the equation system 8 is given hj 8 = £{G, T) for some program G and 
some template constraint matrix T. For all i G N, this allows us to compute pi as follows: 
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Lemma 6.7 (|31j.|28)). Let i S N. Recall that, by construction, /jj+i = ;U>p. [crj+i]]. The 
variable assignment pi+i can be computed as follows: Let £' denote the system of equations 
that is obtained from the equation system cii+i by performing the following steps: 

(1) Remove every equation x = e, where [[e]](/5j) = — oo and replace then the remaining 
occurrences of x with the constant — oo. 

(2) Remove every equation x = e, where lel{pi) = oo and replace then the remaining 
occurrences of x with the constant oo . 

For all equations x = e of the equation system (Tj-|_i with — oo < [[e]](pj) < oo, we can 
compute pi+i{x) as follows: 

Pi+i{x) = sup {p(x) \p:X£,^R,p< IS'Up)} (6.24) 

The value pi+i only depends on the equation system Uj+i and the set of variables already 
identified to be oo, namely, {x | x = e is an equation of ai+i with le}{pi) = oo}. □ 

In consequence, the max-strategy improvement algorithm has to consider each max-strategy 
at most |X| times. Hence, we have: 

Lemma 6.8 (|31|.pH]). The max-strategy improvement algorithm terminates after at most 
|X| • \T,£\ max-strategy improvement steps. □ 



Lemma 6.7 gives us a method for computing pi. For each variable x G X, we have to 
compute 

sup {p(x) \ p:X£, ^Rand p< IS'Up)} . (6.25) 

The equations of £' are of the form b = [[s]]f,.(bi, . . . , hm), where b, bi, . . . , bm are M-valued 
variables, and s is a sequential statement. Thus, by Lemma 5.1, the right-hand sides are 
point-wise minima of finitely many monotone and weak-affine functions. Hence, they are 
monotone and concave. Therefore, (6.25) represents a convex optimization problem. 

The above convex optimization problem is of a very special form. The right-hand sides 
are parameterized linear programs. In consequence, the convex optimization problem can be 
rewritten into an equivalent linear programming problem as follows: In accordance to (5.1) 
and (5.2), in £' , we replace each equation b = (bi, . . . ,bm) with the following linear 
constraints: 

h<n.{y[,...,ynV (6.26) 

$ (6.27) 

T(yi,...,y„)T < (bi,...,b„) (6.28) 

Here, yi, . . . , y^, y'j^, . . . , y'„ are fresh variables. $ is a set of linear inequalities that is 
obtained from the sequential statement s by 

(1) replacing the variables xi, . . . , x„, x'^^, . . . , x^ with the fresh variables yi, . • . , yn, y'l, 

• • • ; y n ) 

(2) replacing all other variables of s with fresh variables, and 

(3) replacing every strict inequality < with a non-strict inequality <. 

We denote the resulting constraint system by C. By construction, we have: 

sup {p(x) I p : X ^ M and p < [[^'K/j)} = sup {p(x) | p : X ^ M and p solves C} (6.29) 

The construction can be carried out in polynomial time. Since C is a set of linear constraints, 
we can use linear programming to compute the optimal value. We have: 
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Lemma 6.9 (Evaluating Max-Strategies). Whenever our max-strategy improvement algo- 
rithm has to compute /U>p[[cj]], this can be performed by solving |X| linear programming 
problems of polynomial size. The linear programming problems do only depend on a and 
the set {x I X = e is an equation of a with [ej (p) = 00} . □ 



Example 6.10. We now discuss how to compute := /^>p2[[^3l from Example 6.6 Note 



that the values of the variables dst,i and dst,2 are already known to be 00. It remains to 



determine the values for the variables di^i and di^2- According to Lemma 6.7, we have 

/53(di,i) = sup {di,i I di,i, di,2 G M, di,i < A $2li(di,i, di,2), 

di,2 < [x'l = 0l|.(oo,oo)} (6.30) 

Observe that $ A $2 can be equivalently rewritten into xi < A x'^^ = — xi + 1 A Xg = — xi. 
Thus, according to the above observations, /93(di^i) is the optimal value of the following 
linear programming problem: 

max di^i di^i < — xi + 1 xi < xi < di^i — xi < di^2 di^2 < (6.31) 

Since the optimal value is 1, we get /93(di^i) = 1. Similarly, to compute P3(di^2)5 we compute 
the optimal value of the following linear programming problem: 

max di^2 di^i < — xi + 1 xi < xi < di^i — xi < di^2 di^2 ^ (6.32) 

This gives us /93(di^2) = 0. 

Both linear programming problems have the same feasible space. This can be uti- 
lized in an implementation to improve the performance. Furthermore, p3(di^i) = d^ and 
/'3(di.2) = d^ 2 for any optimal solution (d^ i)d| 2)yi) of the following linear programming 
problem: 

max di^i + di^2 di^i < — xi + 1 xi < xi < di^i — di < di^2 di^2 < (6.33) 

Hence, for this example, it is sufficient to solve one linear programming problem to determine 
the variable assignment ^3. O 

The technique for evaluating max-strategies can thus be further optimized. It is not nec- 
essary to solve one linear program for each variable. Instead, it is possible to evaluate a 
max-strategy entirely by solving only two linear programming problems of linear size. The 
solution of the first linear programming problem tells us which variables are to set to 00. 
The solution of the second linear programming problem provides us with the values of the 
variables which receive finite values. In this article, we do not elaborate on these techniques. 

6.5. Improving Max-Strategies. We now discuss how we can compute an improvement 



of a max-strategy a w.r.t. a variable assignment p. Since, by Lemma 5.3, this problem is 
NP-hard, we cannot expect to come up with a polynomial time algorithm. We propose a 
solution that utilizes SMT solving techniques. 

Let us first explain the intuition of our method, which is very similar to how the "path 
focusing" technique from Monniaux and Gonnord [12] selects the next iteration path. A 
strategy needs improvement if and only if its value does not define an inductive invariant. 
In other words: there is an outgoing transition from the "invariant candidate" into its 
complement, meaning that there is an execution trace through a statement, starting from 
the invariant candidate and ending with a violation of the current bounds. Whether this 
holds is a SAT problem modulo (SMT) the theory of linear real arithmetic; it can therefore 



24 



T. M.GAWLITZA AND D. MONNIAUX 



be solved by SMT-solvers. Furthermore, the solution from the SMT problem picks one of 
the sequential statements from the merge-simple expansion of the statement as "offending" , 
explaining why the invariant candidate is not an invariant; in other words, it points to 
a possible improvement in the strategy. More generally, the set of solutions of the SMT 
problem maps to the possible improvements. 

Let us now see this process more formally. Assume that we have to improve a given 
max-strategy 

Cr = {Xi = CTl, . . . ,X„ = CTn} (6.34) 

for the equation system 

<f = {xi = ei, . . . ,Xrt = e„} (6.35) 

w.r.t. a variable assignment p, which is a solution of a, i.e., p = IcrJip)- This is exactly the 
situation we are concerned with, when we execute our max-strategy improvement algorithm. 
For each i E {1, . . . ,n}, we now want to check whether or not /9(xj) < [ejj/j. If this is the 
case, we moreover want to compute a max-strategy o"^ for Cj such that p(xj) < [cr-Jp. Note 
that, since /9(xj) < [[ejjp, we could also compute a max-strategy a'^ such that [[cr-Jp = [cj]]/?. 
If /o(xj) < Isilp does not hold, then we set a'^ := ai. Finally, the max-strategy a' := {xi = 
a[, . . . ,x„ = fj^} is an improvement of a' w.r.t. p. 

Given an equation x = e and a variable assignment p, we must decide whether or not 
p{x) < le}{p) holds, and compute a max-strategy a' of e such that p(x) < [[ct']](/9) holds. 
Recall that the semantic equations we are concerned with in this article are of the form 

X = max {ei, . . . , efc} (6.36) 

where, for all i £ {1, . . . ,k}, each expression Ci is either a constant or an expression of the 
form [[s]]j.(xi, . . . , Xm). Hence, we can answer the above question by answering the question 
for each argument ei, . . . , of the maximum separately. It thus remains to find a method to 
check whether or not, for a given statement s, a given j S {1, . . . , m|, a given c G MU{— oo}. 



and a given d G M , [[s]]^-.((i) > c holds — which is, by Lemma 5.3, a NP-hard problem. 
Our approach is to construct the following SAT modulo linear real arithmetic formula (we 
use existential quantifiers to improve readability): 

^{s, d, j, c)■=3^r eR. "^{s, d,j)Av>c (6.37) 

^{s,d,j) ■= 3x G M",x' G M" . Tx < d A ^(s) A v = T,-.x' (6.38) 

Here, ^{s) is a formula that relates every x G M" with all elements from the set [[s]]{x}. It 
is defined inductively over the structure of the statement s as follows: 

«'(s) := s if s is a literal (6.39) 

^(si AS2) := ^(si) A^'(S2) (6.40) 

^{si V S2) := hsis,ys2 A ^'(si)) V (a.^vs^ A ^{32)) (6.41) 

Here, for every sub-formula si V S2 of s, as^vs2 is a fresh Boolean variable. The set of free 
variables of the formula ^{s) is 

{x, x'} U {sLs-^\/s2 I si V S2 is a sub-formula of s}. (6.42) 

The variables x and x' are M'^-valued variables. By construction, s[x/x, x' /x'] is satisfiable 
if and only if ^{s)[x/x, x'/x'] is satisfiable for all x,x' G M"". That is, s and ^{s) are 
describing the same relation. We therefore obtain the following lemma: 
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Lemma 6.11. > c if and only if "^{s, d, j, c) is satisfiable. D 

The difference between the formula s and the formula ^{s) is that the Boolean variables 
of the formula ^{s) additionally describe a path through the formula. More precisely, a 
valuation for the variables from the set {as^vs2 | si V S2 is a sub-formula of s} describes a 
path through s. 

Let s be a statement, d G M > j G {li-'-j^n-}, and c G M U {— oo}. Assume now 
that > c. Our next goal is to compute a max-strategy a for the statement s such 



that > c. By Lemma 6.11, there exists a model M of ^{s,d, j,c). We define the 

max-strategy gm for the statement s recursively by 

o"m(s) s if s is a literal (6.43) 

C7m(s1 a 82) := O-Af(si) A CJA/(S2) (6.44) 

( X _ /o-m(si) if M(a,.jvs2) = 

ctm(si Vsa) := < . ^ . , ■ (6-45) 



By again applying Lemma 6.11, we get [[cAf]]^-.((i) > c and thus the following lemma: 

Lemma 6.12. By solving the SAT modulo linear real arithmetic formula ^(s, d, j, c) that 
can be obtained from s in linear time, we can decide, whether or not [[s]lj.(d) > c holds. 
From a model M of this formula, we can, in linear time, obtain a y -strategy ctm for s such 
that laKif-Xd) > c. □ 

Example 6.13. We again continue our running example, which is summarized in Figure |4l 
We want to know, whether or not [sl^.C-*' 0) > holds. For that we compute a model M of 
the formula ^'(s, (0, 0), 1, 0) which is given as follows: 

^{s, (0, 0), 1, 0) = 3v G R . ^{s, (0, 0)^, 1) A v > (6.46) 

^(s, (0, 0), 1) :^ 3x G x' G . xi. < A -xi. < A ^{s) A v = x'^. (6.47) 

*(s) = ^ ^ ((-a$jv*2 A $i) V (a$^v<i>2 A ^>2)) (6.48) 

The formulas and <I>2 are defined in Figure |4j M = {a<i)iv<l>2 i— >■ 1} is a model, 

which gives us the max-strategy gm = ^ f\ ^2 for s. Thus, by Lemma |6.12 we have 
kA/lt(0,0) = [$A$2ll.(0,0)>0. □ 

We must still provide a method for computing the values for the Boolean variables of a 
model of the formula ^(s,d, j, c). Most of the state-of-the-art SMT solvers, such as Yices 
[211 [22] . support the computation of models directly; the SMTLIB2 standard [6] has a 
get-assignment command that can be used to extract the Boolean part of a model. If this 
feature is not supported, one can compute the model, or only the values for the Boolean 
variables, using standard self-reduction techniques. 

Recall that the semantic equations we are concerned with in this article are of the form 
X = maxjei, . . . , e^}, where each expression Cj, for all i G {1, . . . , fc}, is either a constant 
or an expression of the form [[s]]^..(xi, . . . ,Xm) where s is a statement. As discussed above, 
we can check whether or not p(x) < [[maxjei, . . . , e/;}]](p) holds, and if this is the case 
compute a max-strategy a' such that /9(x) < [[ct']](p) holds, by solving at most k SAT 
modulo linear real arithmetic formulas, each of which can be constructed in linear time. 
Equivalently, instead of running k SMT queries, each obtaining a part of the next strategy. 
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we can rename Boolean variables of these SMT formulas so that they are distinct and query 
the conjunction of the resulting formulas. 

Lemma 6.14. Let x = e be an abstract semantic equation, p a variable assignment, and 
c £ M. By solving a single SAT modulo linear real arithmetic formula that can be obtained 
from e, p and c in linear time, we can decide, whether or not \e\p > c holds. From a model 
M of this formula, provided that \e\p > c holds, we can in linear time obtain a max-strategy 
o'M for e such that [[cta/Ip > c. □ 

Remark that we did not discuss how to choose the next max-strategy a' , except that 
it should satisfy p(x) < [[cr']](/9) (which is ensured by the SMT-solving step). Indeed, there 
could be many different suitable a's, and the SMT-solver may return any of them. There 
is however at least one that is locally optimal, that is, [[ct']](/3) is maximal, otherwise said 
I^Kp) = I^Kp)- Future work should include experiments on the performance impact of 
using the locally optimal strategies instead of just any strategies. 

It is possible to obtain a locally optimal strategy by repeated calls to the SMT-solvers. 
A naive method would be to query the SMT-solver for a a" such that [[o"'||(/9) < [[(t"]](/9), 
then for a a'" such that la"} (p) < [o"'"]] (p) and so on until there is no locally better strategy; 
the last strategy obtained is thus locally optimal. A less naive method would be to take 
a rough bound M > [[e]](/>) and perform binary search in the interval [|Ic7']](;o), M]: at each 
step, maintain an interval [a,b] and query whether there exists a" such that [[ct']](p) > 
if so, replace a by and restart, if not, replace b by and restart. The SMT-solving 
community is now considering the problem of optimization modulo theory [58j and we can 
hope for progress in this respect. 



In this section, we shall prove that the decision problem associated with our computation 
is at the second level of the polynomial hierarchy, even if there is a single feedback vertex, 
a single real variable, and a single constraint in the template. It is therefore unsurprising 
that our algorithm exhibits exponential complexity in the worst case, by enumerating an 
exponential number of strategies: we shall then provide an artificial example on which it is 
the case. 

7.1. A Lower Bound on the Complexity. In this section we show that the problem 
of computing abstract semantics of programs w.r.t. the interval domain is n2-hard. flj- 
hard problems are conjectured to be harder than both NP-complete and CONP-complete 
problems. For further information regarding the polynomial-time hierarchy see, for instance, 
Papadimitriou [50], Stockmeyer [6T] . 

Theorem 7.1. The problem of deciding, whether, for a given program G, a given template 
constraint matrix T, and a given program point v, V'^lv] > — oo holds, is flj-Ziard. 

The problem remains Flj-Ziard even if the program variables are abstracted at a single 
program point and the template constraint matrix T is restricted to a single variable x and 
a single constraint of the form x < B. 

Proof. We reduce the n2-complete problem of deciding the truth of a V*3* propositional 
formula [63j to our static analysis problem. Let 




7. Complexity 
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be a formula without free variables, where is a prepositional formula. We consider the 
analysis of the following pseudo-C program, where n is a constant: 

X = 0; 

while (x < 2") { 

z = X ; 

if (x >= 2"-i) { xn = l; X -= T-^; } else { Xn = 0; } 

if (x >= 2^-1) { xi = l; X -= 2^-1; } else { xi=0; } 
choose 

if {^'{xi,...,xn,yi,--.,ym)) { 
X ++; 

} 

} 

In intuitive terms: this program initializes the program variable x to 0. Then, it enters 

a loop: compute into xi, . . . ,a;„ the binary decomposition of x, and non-deterministically 
choose yi, ■ ■ ■ ,ym- If ^' is true, it increments x by one and loops, unless x reaches 2" in which 
case it terminates; otherwise, it just loops. Thus, there exists a terminating computation if 
and only if <I> holds. 

We reformulate the above pseudo-C program into the program G = {N, E, st) that uses 
only one program variable x, where 

(1) N = {st, 1, 2} is the set of program points, and 

(2) E = {(st, x' = 0, 1), (1, s, 1), (1, X > 2^*, 2)} is the set of control-flow edges, where 

^ = ~ ^ 

A ((z„ > 2"-i A z„_i = X - 2"-i A x„ = 1) V (z„ < 2"-^ - 1 A x„ = 0)) 
A ■ ■ ■ 

A ((zi > 2^-^ A zo = xi - 2^"^ A xi = 1) I (zi < 2^"^ - 1 A xi = 0)) 
A 

A x' = X + 1. 

The statement is obtained by taking formula in negation normal form (all 

negations pushed to the leaves), leaving the Boolean structure in place and replacing 
each positive literal x by a test x = 1 and each negative literal -ix by a test x = 0. 
With this formalization, $ holds if and only if V[2] ^ 0. For the abstraction, we consider the 
interval domain, or even simply the domain of upper bounds on x (i.e., we have constraints 
of the form x < 6). By considering the Kleene iteration, it is easy to see that ^[2] 7^ holds 
if and only if V^[2] > ^ holds. Thus $ holds if and only if ¥^[2] > ^ holds. □ 

7.2. An Example with Exponential Running Time Behavior. Recall that the num- 
ber of strategy improvement steps is exponentially bounded by the size of the input. Each 
step consists in one phase of SMT-solving for linear real arithmetic followed by solving a 
linear program of polynomial size. Thus, each step can be performed in exponential time. 
Therefore, the whole algorithm can be executed in exponential time. 
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We shall now see that our algorithm takes exponential time on the instances that 



are similar to the instances generated from the reduction in the proof of Theorem 7.1 
The instances generated from the reduction require B(2") steps. However, the input is of 
size 0{'n?), because the numbers 2""^, 2"'"^, . . . , 2*^ require space 0(n^). We modify the 
instances generated from the reduction in such a way that the sizes of the programs are in 
0{n). We achieve this by introducing auxiliary variables for the numbers 2"'"^, 2""^, . . . , 2". 
For all n E N, we define the program G„ = {N, E, st), where 

A^ = {st,l}, (7.2) 

^ = {(st,x; =0,l),(l,s,l)}, (7.3) 

with 

s = yi = 1 A y2 = 2yi A • • • A y„ = 2y„_i A z„ = xi (7.4) 

A (z„ > y„ A z„_i = z„ - y„ V z„ < y„ - 1 A z„_i = z„) (7.5) 

A--- (7.6) 

A (zi > yi A zo = zi - yi V zi < yi - 1 A zo = zi) (7.7) 

A x'l = xi + 1. (7.8) 

Here, xi is the only program variable. It is sufficient to use the template constraint matrix 
T = (l) , which corresponds to the template xi. That is, we are only interested in the upper 
bound on the value of the variable xi. Remark that the strategy iteration does not depend 
on the strategy improvement operator in use: at any time there is exactly one possible 
improvement, until the least solution is reached. All strategies for the statement s will be 
encountered. Thus, the strategy improvement algorithm performs 2" strategy improvement 
steps. Since the size of Gn is Q{n), exponentially many strategy improvement steps are 
performed. 



7.3. An Upper Bound on the Complexity. In Section 7.1 we have provided a lower 
bound on the complexity of computing abstract semantics w.r.t. the template linear do- 
mains. The associated decision problem is not only n2-hard, but in fact Ilg-complete: 

Theorem 7.2. The problem of deciding, whether, for a given program G, a given template 
constraint matrix T, and a given program point v, V^[v] > — oo holds, is in r\2- 

Proof. We consider the negation of the above problem: for a given program G, a given 
template constraint matrix T, a given program point v, and a given i £ {1, . . . ,m}, decide 
whether vl[v] = — oo; we shall now show that this problem is in Sg. 

In non-deterministic polynomial time we can guess a max-strategy a for £' := £{G, T) 
and a set X°° of variables that have the value oo; these will form the witness for the initial 
existential quantifier. We can evaluate the max-strategy a w.r.t. the set of variables X°° 



assigned to +oo in polynomial time using linear programming (cf. Subsection 6.4). Let 
/0(T,x°° denote the resulting variable assignment. 

We shall now show that checking whether this strategy (and set of infinite variables) is 



stable is in CO-NP. Because of Lemma 5.3, we can use an NP oracle to check whether there 
exists an improvement of the strategy a w.r.t. Pa,yi°°^ which is exactly the negation of being 
stable. 



If the strategy is stable, we know that Po-,x°° ^ l^\£''\ holds. Therefore, by Lemma 



6.1 



we have Po-,x°°(x^,i) ^ b] for all program points v G N and alH G {1, . . . ,m}. Since we 
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Figure 5: Benchmark for the prototypical implementation 

also know that there exists some max-strategy a and some set X'^ such that Po-,x°° = /^I'^'l, 
we accept, whenever Pa,:}(.°°{^v,i) = —oo holds. □ 



8. Experimental Results 

We have implemented our presented max-strategy improvement algorithm; our prototype 
should however be considered as a proof-of-concept. Benchmark results for real examples 
are left for future work. 

The algorithm is implemented in OCaml 3.10.2; it uses Yices 1.0.27 |2H [22] for com- 
puting models for SAT modulo linear real arithmetic formulas; for solving the occurring 
linear programming problems it uses QSOpt-Exact 2.5.6 |3l[23], an exact arithmetic ver- 
sion of QSOpt. We made our experiments under Debian Linux (Lenny) running under 
Parallels Desktop 4 on an Apple MacBook (2.16 GHz Intel Core 2 Duo, 2GB 667 MHz 
DDR2 SDRAM). Our solver takes as input a text file that contains the program and the 
linear templates to be used for the analysis. The benchmark results for the example of Sec- 
tion 7.3 are shown in Figure [Sj The number of max-strategy improvement steps grows — 



as expected — exponentially in n. Briefly, the implementation solves 2 linear programming 
problems and at most 2(2n-|- 1) = 4n-|-2 SMT queries per max-strategy improvement step. 
The factor 2 comes from the fact that we have 2 program points and the factor (2n -|- 1) 
from the fact that we have (2n + 1) templates. We emphasize that the example is created 
artificially. Since the problem we are solving is Flj-complete, it is not surprising that there 
exists an example that does not scale. 



For the running example of this article (Example 4.1), our solver computes the correct 
result after 0.05 seconds. 

There are also many possibilities for improving the implementation. On the limited 
number of examples that we tried with our proof-of-concept implementation, the main 
computational expense comes from the linear programs that have to be solved. This is 
mainly due to the fact that we use an exact arithmetic simplex solver and we solve every 
occurring linear program from scratch although we know beforehand that the linear prob- 
lems that we have to solve are feasible. Instead of solving each linear program from scratch. 
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one could use the information obtained from the previously solved linear programs (that 
are similar). One can also utilize the information obtained from the SMT solver in order to 
obtain a feasible basis to start the simplex method with. 

9. Conclusion and further research directions 

We have proposed a method for computing the least fixpoints in template linear constraint 
domains (e.g., Cartesian products of intervals) of transition systems specified using linear 
real arithmetic formulas. This allows finding the strongest invariant in this domain of a 
loop consisting only in linear assignments and non-strict linear inequalities over the real 
numbers. 

Because it distinguishes individual paths in the program, our method does not suffer 
from the imprecision induced by convex hull operations. These paths are looked up on 
demand, as results from satisfiability testing, therefore avoiding memory blowup. Our 
technique, however, has exponential worst case complexity, which is hardly surprising since 
the decision problem associated with our computation is flj-complete. Due to limited 
resources, we have so far not been able to implement it in a tool capable of running on real 
examples. 

It is quite obvious that, due to the use of SMT queries, the size of the problems given as 
input, and their branching structure, must be limited. One method for limiting the size of 
the SMT formulas is to decompose the program into statements, thus adding more points 
at which states are abstracted, as proposed by ^49^: this simplifies the problem, but may 
reduce precision; another method is to restrict the analysis to a subset of the variables, 
determined by some form of dependency analysis. 

The restriction to linear templates and linear statements may seem onerous. It might 
be possible to apply the same ideas for non-linear templates [30]. With respect to non-linear 
statements, a possibility is to linearize them |44| I46| : for short, assuming A < x < B where 
A and B are constants, then the nonlinear constraint z = xy may be abstracted by the linear 
constraint {Ay < xy < By A y > 0) V {By < xy < Ay A y < 0). If the assumptions made by 
the linearization are found not to hold for the fixed point computed by the max-strategy 
iteration technique, one has to relax these assumptions and restart the solving process. 

More generally, one may envision a nesting of two iteration schemes: the inner scheme 
solving exactly, using max-strategy iteration, a simplification of the concrete program, the 
outer scheme iterating over possible simplifications. The outer scheme would deal with 
all program features not supported by our max-strategy iteration algorithm. Consider 
pointers, for instance: the outer scheme could temporarily assume that x and y may be 
aliased, while z is not aliased with anything, and then rewrite the program according to 
these assumptions in order to obtain a pointer-free program (may-alias information becomes 
non-deterministic choice, while must-aliased variables are merged). This outer iteration may 
be ascending and optimistic, starting with strong assumptions on the program and relaxing 
them progressively as the results of the inner scheme invalidate them, or decreasing and 
pessimistic, starting with weak assumptions and strengthening them progressively as the 
results of the inner scheme show them to be too severe. Such mixed approaches would 
cope with programs features not directly supported by our max-strategy iteration solver. 
Further work is needed in this direction to ascertain which techniques are usable. 

Another problem is finding suitable templates — while there exist obvious choices in 
some cases (intervals for getting rough invariants of control applications, difference bounds 
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for scheduling applications, etc.), there is no generic method for obtaining good templates. 
Amato et al. [2] proposed finding templates using principal component analysis, but it is 
yet unclear whether this approach suited to practical problems. A simple solution may be 
to run some conventional polyhedral analysis, and keeping the directions of the polyhedra 
obtained before widening. 

Our max-strategy iteration algorithms only deal with real numerical values. We can 
cope with integers by relaxing them to reals, with the usual precautions {x < y converted 
to X < y — 1). Another possible extension is to integrate Boolean types, or more generally 
finitely enumerated types, into the invariant, or equivalently, to insert them implicitly into 
the control flow. 

An intriguing extension of our framework is the case where the control flow is specified 
implicitly. The problem considered in this article is expressed as a control-fiow graph given 
by a list of nodes and statements over the transitions. Now consider the addition of n 
Boolean variables to the system: a common method to encode such variables in a transition 
system is to distinguish all Boolean combinations and every control node, and thus multiply 
the number of control nodes by 2". Clearly, we would prefer to work directly on the transi- 
tion relation of the original program, which would include free Boolean variables encoding 
the departure and arrival control states, and consider our abstract reachability problem on 
programs expressed using this succinct representation. Since this problem includes Boolean 
reachability (also known as the reachability problem for succinctly represented graphs), 
which is PSPACE-complete ^51j, it is PSPACE-hard. Our strategy iteration approach can 
be extended to show that it is in CONEXPTIME. We conjecture that it is CONEXPTIME- 
complete, but we have so far not been able to prove it. It is also unknown whether some 
practically useful algorithms, perhaps based on binary decision diagrams (BDDs), could be 
devised for this problem. 
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